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Abstract 

Recently, a rigorous yet concise formula has been derived to evaluate the information flow, and 
hence the causality in a quantitative sense, between time series. To assess the importance of a 
resulting causality, it needs to be normalized. The normalization is achieved through distinguish¬ 
ing three types of fundamental mechanisms that govern the marginal entropy change of the flow 
recipient. A normalized or relative flow measures its importance relative to other mechanisms. 
In analyzing realistic series, both absolute and relative information flows need to be taken into 
account, since the normalizers for a pair of reverse flows belong to two different entropy balances; 
it is quite normal that two identical flows may differ a lot in relative importance in their respective 
balances. We have reproduced these results with several autoregressive models. We have also 
shown applications to a climate change problem and a financial analysis problem. For the former, 
reconfirmed is the role of the Indian Ocean Dipole as an uncertainty source to the El Nino predic¬ 
tion. This might partly account for the unpredictability of certain aspects of El Nino that has led 
to the recent portentous but spurious forecasts of the 2014 “Monster El Nino”. For the latter, an 
unusually strong one-way causality has been identified from IBM (International Business Machines 
Corporation) to GE (General Electric Gompany) in their early era, revealing to us an old story, 
which has almost gone to oblivion, about “Seven Dwarfs” competing a giant for the mainframe 
computer market. 
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I. INTRODUCTION 


Information flow, or information transfer as it may be referred to in the literature has 
long been recognized as the appropriate measure of causality between dynamical events [it]. It 
possesses the needed asymmetry or directionalism for a cause-effect relation, and, moreover, 
provides a quantitative characterization of the otherwise statistical test, e.g., the Granger 
causality testj^. For this reason, the past decades have seen a surge of interest in this 
arena of research. Measures of information flow proposed thus far include, for example, the 
time-delayed mutual information j^, transfer entropy j^, momentary information transfer}^, 
causation entropy [^, etc., among which transfer entropy has been proved to be equivalent 
to Granger causality up to a factor 2 for linear systems [a]. 

Recently, Liang and Kleeman hnd that the notion of information flow actually can be 
put on a rigorous footing within a given deterministic system jst]. The basic idea can be best 
illustrated with a system of two components, say, Xi and X 2 . The problem here essentially 
deals with how the marginal entropies of Xi and X 2 , written respectively as Hi and H 2 , 
evolve. Take Hi for an example. Its evolution could be due to Xi its own and/or caused by 
X 2 . That is to say, dHi/dt can be split exclusively into two parts: 

dHi _ dHl , ^ 

dt dt ^ 

if we write the contribution from the former mechanism as dH*/dt and that from the latter 
as T 2 ^i. This T 2 _j.i is the very time rate of information flowing from X 2 to Xi. 

To hnd the information how it suffices to hnd dH^ / dt, since, for each deterministic 
system, there is a Liouville equation for the density of the state and, accordingly, dHi/dt 
can be obtained. In Ref. jsl , dHl/dt is acquired through an intuitive argument based on an 
entrOTy evolutionary law established therein. The same result is later on rigorously proved; 
see |9| for a review. For stochastic systems which we will be considering in this study, the 
trick ceases to work, but in [l^ Liang manages to circumvent the difficulty and hnd the 
result, which we will be briehy reviewing in the following. 

Gonsider a two-dimensional (2D) stochastic system 


dX = F(X, t)dt + B(X, t)dXV, 


( 1 ) 


where F = (Fi,F 2 )^ is the vector of drift coefficients (diherentiable vector held), B = 
^11 ^12 


^21 ^22 


the matrix of stochastic perturbation coefficients, and W a 2D standard Wiener 


process. Let gij = Yhk ^ikbjk, and pi be the marginal probability density function of Xj. Liang 
(2008) [ 1 ^ proves that the time rate of information howing from X 2 to Xi is 


^ 2^-1 — —E 


pi dxi ) 2 


1 

pi dx\ ) ’ 


( 2 ) 


where E signihes the operator mathematical expectation. This measure of information how 
is asymmetric between the two parties and, particularly, if the process underlying Xi does 
not depend on X 2 , then the resulting causality from X 2 to Xi vanishes. This is the so-called 
property of causality, which asserts, in the above language, that T 2 ^i vanishes if Fi and 
Qii are independent of X 2 - When T 2 ^i is nonzero, it may take positive or negative values. 
A positive T 2 ^i means X 2 causes Xi to be more uncertain, while a negative T 2 ^i reduces 
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the entropy of Xi, and hence functions to stabilize the latter. For more details, referred to 
Ref. 

The above theorem is recently applied to time series analysis. Under the assumption of 
a linear model with additive noise, Liang [ll| shows that the maximal likelihood estimate of 
the information flow in ([2]) turns out to be very tight in form, involving only the common 
statistics namely sample covariances. Take two series Xi and X 2 for example. The rate of 
information flowing (units: nats per unit time) from X 2 to Xi is shown to be 


^ 2^1 — 


UilUl202,dl — '^i2'^l,dl 
C^1lC^22 — 


(3) 


where Cij is the sample covariance between Xi and Xj, and Ci^dj is that between X^ and Xj, 
Xj being the difference approximation of dXj/dt using the Euler forward scheme: 


X,(n) 


Xj{n + k)- Xj(n) 
kAt 


(4) 


Here k is usually 1, but for highly chaotic and densely sampled series, k = 2 should be 
chosen to avoid getting spuriously large Xj due to possible shock structures that make 
the differencing highly sensitive to the error in Xj. This formula involves only sample 
covariances, and is hence very convenient to evaluate. In addition, it is easy to see that if 
U 12 = 0 then T 2 _).i = 0, but when T 2 ^i = 0 the correlation U 12 does not need to vanish. 
That is to say, contrapositively, causation implies correlation, but correlation does not imply 
causation. In an explicitly quantitative way, this corollary resolves the long-standing debate 
over causation versus correlation. 


(a) a=1/2 (b) a=0 




Iteration Iteration 

FIG. 1: (Color online) A sample path {Xi and X 2 ) of the autoregressive process ([5]) for (a) a = 0.5, 
/3 = 0, and (b) a = 0, /? = 0. 


In general, the rates of information flow differ from case to case. One would like to 
normalize them in real applications, just as that with correlation coefficients, in order to 
assess the importance of the ffow identified, if any. For example, consider two series generated 
from two autoregressive processes: 

Xi(n -|- 1) = 0.1 -|- 0 . 5 X 1 (n) -|- a;X 2 (n) -b ei(n), 

X 2 {n -|- 1) = 0.7 -|- /dXii^n) 0 . 6 X 2 ( 77 .) -b 62 ( 77 ), 
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(5a) 

(5b) 






































where the errors ei ~ iV(0,1) and 62 ~ N{0, 1) are independent. Generate a pair of series 
with 80000 steps on MATLAB, and perform the causality analysis (with k = 1), one obtains 
(units are in nats per iteration; same below in this section) 

(1) for {a, (3) = (0.5,0), = 0.1481, Ti ^2 = -0.0002; 

(2) for {a,P) = (0,0): T 2 ^i = -1.37 x 10-^ Ti ^2 = 1-28 x 10“^ 

(The results may differ slightly with different series due to the random number generation.) 
For the case a = 0.5, |T 2 _;.i/Ti^ 2 | > 740, one may then conclude that this is a one-way 
causality from X 2 to Xi, as is indeed true. For the latter, however, one actually cannot say 
much from the numbers. Although they are small, they tell no more than that the infor¬ 
mation flows in both directions are of equal importance. From this particular example, one 
sees that, though theoretically the information flows in both directions should be precisely 
zero, in reality the rates evaluated from two time series, albeit very small, generally do not 
precisely vanish. One then cannot tell whether the causality indeed exists. We need to 
normalize the obtained information flow for one to see the relative magnitude. 

Of course, one may perform statistical test for the results. In the first example, at a 90% 
significance level, T 2 _j.i = 0.1481 ±0.0015, Ti ^2 = —0.0002 ±0.0015, so Ti_j .2 is insignihcant. 
For the second example, T 2 ^.i = —1.37 x 10“® ±2.19 x 10“®, Ti ^2 = 1.28 x 10“^±2.19 x 10“®. 
That is to say, at a 90% level, these flow rates are not significantly different from zero. In 
this sense, it seems that we could indeed infer rather accurately the true causality. However, 
a statistical test just tells how precise the estimate is; it does not tell how the information 
flow may weigh in the entropy balance of the series. Besides, it depends on the length of 
the series which is irrelevant to the parameter to be estimated. To see this more clearly, 
consider the following case: 

a = 0.01, P = 0.01. 

Obviously, the information flows, albeit existing, make only tiny contributions to their 
respective series, as the coupling coefficients are over an order smaller; in classical per¬ 
turbation analysis, they can be dropped to the first order approximation. The com¬ 
puted information ffows, at a 90% signihcance level, are T 2 ^i = (1.648 ± 0.692) x 10“^, 
Ti ^2 = (1.123 ± 0.639) X lO”'^. These results indicate that they are significantly different 
from zero—This from one aspect testihes to the success of the formalism. However, the small 
numbers cannot tell how important they are, since, with a slowly varying series, even the 
dominant flow could be very small. On the other hand, if we cut the series by half and pick 
the hrst 40000 points for the analysis, then the result will be T 2 _j.i = (0.653 ± 0.751) x lO”'^, 
Ti ^2 = (1.240 ± 0.690) X 10“^. So one finds that T 2 ^i is insignificant while Ti _,.2 is. (Again, 
these small numbers may show large fluctuation if series of different lengths are used, because 
in reality they are insignificant.) Can one thus conclude that there is a one-way causality, or 
can he/she thus asserts that this shortened series yields a more reliable estimation? Surely 
this is absurd. The problem here is that we do need a normalized flow to evaluate its 
importance relative to other factors. 

The normalization is by no means as simple as it appears with correlation analysis, which 
is in an inner product form based on the Cauchy-Schwarz inequality. In the following we 
will see that we need to get down to the fundamentals of information flow before arriving at 
a logically and physically sound normalizer. As what we did before in 0 , this normalizer 
is estimated with the method of maximal likelihood estimation, using the given time series 
1 section HTT]) . The resulting formula is then validated fsection HV)) with the autoregression 
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example as shown above. To demonstrate its diverse applicability, presented subsequently 
are two real world examples, one from climate science 1 section IV Aj) . another from hnancial 
economics 1 section IV Bj) . This study is summarized in section m 


II. INFORMATION FLOW NORMALIZATION 


As mentioned in the introduction, the normalization is not as simple as it seems to be. 
A natural normalizer that comes to mind, at the hint of correlation coefficient, might be the 
information of a series transferred from itself. A snag is, however, that this quantity may 
turn out to be zero, just as that in the Henon map, a benchmark problem we have examined 
before (see the references in [^). 

Another snag is, the two way causality actually cannot be normalized together, as that in 
correlation analysis based on the Cauchy-Schwarz inequality. That is to say, two information 
flows of equal size may have different relative importances in their respective series. 

Since in our framework the information flow from X 2 to Xi is the contribution of X 2 that 
makes to the time rate of change of the marginal entropy of Xi, written Hi, one may ask 
whether the rate of marginal entropy change dHi/dt can be the normalizer. This might be 
appealing, but there is a third snag. As information flow can be positive or negative, dHi/dt 
may turn out to be smaller than the flow in absolute value—The so-obtained relative flow 
would exceed 100%, a case which we do not want to see. 

All the above tells that information flow normalization is by no means a trivial task. We 
need to get to the basics and analyze how an information flow within a system is derived. 
By Ref. 1^, the time rate of change of the marginal entropy of Xi is 


dHi 

dt 


= -E Fi 


d log Pi 
dxi 


- -E 
2 


^ 11 - 


log pi 

dxl 


( 6 ) 


It is actually a result of two mutually exclusive mechanisms: the hrst is the information 
flow T 2 ^i as shown in ([2]); the second is the complement, i.e., the rate of entropy increase 
without taking into account of the effect of X 2 . Denote this latter as Liang (2008) (loj 
has proved that 


dHi^ 

dt 


E 


dxi) 




(9^ log pi \ 
dx\ ) 


If ( ^ d‘^9iipi \ 

2 \pi dxj ) ■ 


(7) 


The right hand side has three terms. The hrst term is precisely the time rate of change of 
Hi due to Xi itself in the absence of stochasticity. This is the starting point which we have 
shown in 2005 in establishing the rigorous formalism and proved later on (cf. i). Hence 
through a careful analysis, the increase in the marginal entropy Hi is decomposed into three 
parts: 


dt \ dxi ) ’ 

^j^noise I- f log Pi \ 


1 p / 1 d^9iipi\ 
2^^pi dxi 


( 8 ) 

(9) 


as well as T 2 ^i in (|2]), which correspond to, respectively, the contribution due to Xi itself, 
the stochastic effect, and the information howing from X 2 . Note this decomposition does 
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not appear explicitly in the marginal entropy evolution equation (|6]), as the two stochastic 
terms cancel out. 

The normalization is now made easy. Let 


Z = 1^2-i-ll + 


dEl 

dt 


+ 


dH^oise 

dt 


( 10 ) 


Obviously it is no less than T 2 ^i in magnitude, and cannot be zero unless Xi does not 
change, a situation that is excluded in time series analysis. We may therefore pick Z as the 
normalizer, and dehne 


T2^1 — T2^i/Z. 


( 11 ) 


This way if T 2 ^i = 1, the variation of Hi is 100% due to the information flow from X 2 ] if 
T 2 ^i is approximately zero, X 2 is not causal. Therefore, T 2 ^i assesses the importance of the 
influence of X 2 to Xi relative to other processes. 

It should be pointed out that, the above normalizer applies to T 2 _j.i only. For Ti^ 2 , h is 




dH* 

dt 


+ 


dH^oise 

dt 


which may be quite different in value. This from another aspect reflects the asymmetry 
between T 2 ^i and Ti_,. 2 . 


III. ESTIMATION 


As in Ref. llj, consider a linear version of the stochastic differential equation (SDE) ([T]) 


dX = f + AXdt + BdW, 


( 12 ) 


where f is a constant vector, and A = (aij) and B = (6,^) are constant matrices. Initially if 
X obeys a Gaussian distribution, then it is a Gaussian for ever, i.e.. 


p(x) = - ;- p-|(x-M)^S~hx-/x) 


(13) 


with the mean fi and covariance matrix S governed by equations 


dfi 

dt 

dS 

dt 


f + A/x, 

AS + SA^ + BB^. 


(14) 

(15) 


So Eqs. (IH]) and (Ej) can be explicitly evaluated: 


and 


dH* 

dt 


E (oii) 


Oil, 


dH^oise ^ I / d‘^\ogpi\ I ( I d‘^giipi\ 

dt 2 dx\ ) 2 \pi dx\ J 


( 16 ) 
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= —E 
2 

= 

2 af 


9ii - 


CTi 


1 

2 


d 9upi 
dx\ 


d'^9iiPi , , 

P2\l dxidx2 

P 2 \i{x 2 \xi)dx 2 ] dxi, 


since neither gu nor pi depends on X 2 . But /][jP 2 |i = 1, and pi is compactly supported, the 
whole second term on the right hand side then vanishes. Hence 


dH" 

dt 


1 9ii 

2 an’ 


where for notational symmetry af has been written as ay_ 
mation flow from X 2 to Xi as we have obtained before ( 8 [ 


(17) 

These, together with the infor- 
, T 2 n.i = ^ 0 . 12 , form the three 
constituents that account for the evolution of the marginal entropy of Xi. 

An observation about = fi'ii/(2aii), where 5^11 = + h\ 2 , is that it is always 

positive. That is to say, the noise always contributes to increase the marginal entropy of 
Xi, conforming to our common sense. In hnancial economics, this reflects the volatility of, 
say, a stock. On the other hand, for a stationary series, i.e., when d/dt = 0, the balance on 
the right hand side of Eq. (ITSD requires that 2aii ~ gu- So this quantity is also related to 
the noise-to-signal ratio. 

The above results need to be estimated if what we are given are just a pair of time series. 
That is to say, what we know is a single realization of some unknown system, which, if known, 
can produce inhnitely many realizations. The problem now is turned into estimating flT^ 
and ffT7|) with the available statistics of the given time series. 

We use maximum likelihood estimation (e.g., 0 ) to achieve the goal. The procedure 


follows precisely that of [ll|, which for easy reference we briefly summarize here. As estab¬ 


lished before, a further assumption that &12 = 0 , and hence gu = will much simplify the 
result, while in practice this is quite reasonable. 

Suppose that the series are equal-distanced with a time stepsize At, and let N be the 
sample size. Consider an interval [nAt, (n-l-l)Af], and let the transition probability function 
(pdf) be p(X„+i|X„; 0), where 6 stands for the vector of parameters to be estimated. So 
the log likelihood is 


N 


In 


( 9 ) = 5^1ogp(X„+i|X„;0) + logp(X0. 


n=l 


As N is usually large, the term p(Xi) can be dropped without causing much error. The 
transition pdf is, with the Euler-Bernstein approximation (see 0 ). 


1 


- i (Xn+i-Xn -FAt)^(BB^At) ^(x^+i -Xn -FAt) 


p(X„+i x„+i|X„ x„) p^) 2 det(BB^Af)]V 2 " 

where F = f -|- AX. This results in a log likelihood functional 

. N N, At / 1 . 1 

iN{f, A, B) = const - — log C/ 11 P 22 - ^ - V Rl,n + - V ^ 2 ,n | ) 

n=l d22 

where 


(18) 


^i,n {,fi “ 1 “ ^i2^2,n) ^ ^ I 5 2 , 


(19) 
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( 20 ) 


and Xi^n is the Euler forward differencing approximation of 


Y _ 


X 


i^n+k 


— Y 
■^i.n 


kAt 


with k > 1. Usually k = 1 should be used to ensure accuracy, but in some cases of 
deterministic chaos and the sampling is at the highest resolution, one needs to choose k = 2. 
Maximizing we find[lll| that the maximizer (/i, an, 012 ) satisfies the following algebraic 
equation: 


■f w w ■ 


r/i 1 


Xi 

Xi XI X1X2 


On 

= 

XiXi 

[X 2 X1X2 X 2 ^ J 


. _ 


_X 2 Xl. 


where the overline signifies sample mean. After some manipulations 
the MLE estimators: 


( 21 ) 

(see 0 ). this yields 


where 


- C.oC, 


an — 

ai2 = 
^11 = 




detC 

—Cl 2 Ci^dl + C 11 C 241 


QN,lAt 

N 


detC 


fi — Xi—oiiXi — 012 X 2 , 


( 22 ) 

(23) 

(24) 
(26) 


C„ = (X, - X,)(X, - X,), (26) 

Ct_i, = (X,-Xi)(Xj-~Xj], (27) 

are the sample covariances, and 

N 2 

Qn ,1 = ~ (/l + OllXi^n + 012X2,n) 

n=\ 

N _ 2 

= |^(Xi,n ~ Xi,„) — aii(Xi,„ — Xi) — ai2(X2,„ — X 2 ) 

n=\ 

= X(C'rfi,di + OiiUii + 0126*22 — 2011 ( 7 ^ 1,1 — 2ai2C*di,2 + 2aiiai2C'i2). ( 28 ) 


On the other hand, the population covariance matrix S can be rather accurately esti¬ 
mated by the sample covariance matrix C. So ffT6|) - (fT7H become 


dHl _ 

dt 

dH^oise 

dt 


C'22C'i, 


dl 


U 12 C 


120^2,dl 


C 11 C 22 


/^2 

'-^12 


At 

2X011 


Qn,1 


(29) 

(30) 


As that in Ref. 0 with T2^1-i here and should bear a hat, since they are the 

corresponding estimators. We abuse the notation a little bit to avoid notational complexity; 

























from now on they should be understood as their respective estimators. With these the 
normalizer is 


Z 




2^-11 


1 

dH( 

+ 

dH'^oise 

dt 

dt 




and hence we have the relative information flow from X 2 to W: 




T2^1 

z 


(31) 


(32) 


IV. THE AUTOREGRESSIVE EXAMPLE REVISITED 

Back to the autoregressive process exemplihed in the beginning. When a = 0, /? = 0, the 
computed relative information flow rates are: 


= -0.0016%, ri^2 = 0.0018%. 

Clearly both are negligible in comparison to the contributions from the other processes in 
their respective series. This is in agreement with what one would conclude based on the 
absolute information flow computation and statistical testing. For the case a = (3 = 0.01, 
in which one may encounter difficulty due to the ambiguous small numbers, the computed 
relative information flow rates are: 


= 0.018%, ri^2 = 0.015%. 

Again they are essentially negligible, just as one would expect. 

On the other hand, when a = 0.5, (3 = 0, 

T2^i = 17%, ri^2 = -0.03%. 

To Xi, the influence from X 2 is large, contributing to more than 1/6 of the total entropy 
change. In contrast, the influence from Xi to X 2 is negligible. 

It should be pointed out that the relative information flow, say, T 2 ^i, makes sense only 
with respect to Xi, since the comparison is within the series itself. Here comes the following 
situation: For a two-way causal system with absolute information flows T 2 ^i and Ti ^2 of 
equal importance, their relative importances within their respective series could be quite 
different. For example. 


Xiiji -|- 1) = — 0 . 5 X 1 (n) -|- 0.9X2(n) -|- 2ei(n), 

X 2 {n -|- 1 ) = — 0 . 2 Xi(? 7 ,) -|- 0.5X2(77.) - 1 - 62(77.). 

where 61 ( 77 ) and 62 ( 77 ) are identical independent normals (~N(0,1)). Initialize them with 
random values between [0,1] and generate 80000 data points on MATLAB. The computed 
information flow rates (in nats per iteration) 

|T2^i|=0.13, |Ti^2|=0.12, 

which are almost the same. The relative information flows, however, are quite different: 

|r2^i|=6.7%, |ri^ 2 | = 13%. 
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In terms of relative contribution in their respective series, the former is way more below the 
latter. 

Generally speaking, the above imbalance is a rule, not an exception, reflecting the asym¬ 
metry of information flow. One may reasonably imagine that, in some extreme situation, 
a flow might be dominant while its counterpart is negligible within their respective series, 
although the two are of the same order in absolute value. 


V. APPLICATIONS 


A. Re-examination of the ENSO-IOD relationship 


El Nino, also known as El Nino-Southern Oscillation, or ENSO for short, is a long known 
and extensively studied climate mode in the tropical Pacific Ocean due to its relation to 
the global disasters like the droughts in Southeast Asia, Southern Africa, and northern 
Australia, the floods in Ecuador, the increasing number of Typhoons, the death of birds and 
dolphins in Peru, and the famine and epidemic diseases in far-flung regions of the world [l^. 
A correct forecast of an El Nino (or its cold counterpart. La Nina) a few months earlier will 
not only help issue in-advance warnings of potential disastrous impacts, but also make the 
subsequent seasonal weather forecasting much easier. However, this aperiodic leading mode 
in the tropical Pacific seems to be extraordinarily difficult to predict. A good example is 
the latest “Super El Nino ” or “Monster El Nino ”, which has been predicted to arrive in 
2014 in a lot of portentous forecasts, turns out to be a computer artifact. 

For more reliable predictions, it is imperative to clarify the source of its uncertainty or 
unpredictability. In Ref. [lH, we have presented an application of Eq. ([3]) to the relation 
study between El Nino and the Indian Ocean Dipole (lOD), another major climate mode in 
the Indian Ocean[l^, and found that the Indian Ocean is a source of uncertainty that keeps 
ENSO from being accurately predicted. Since in that study there is no relative importance 
assessment, we do not know whether the information flows, albeit significant, do weigh much 
in the modal variabilities. We hence redo the computation using the relative information 
flow formula ([32]). 

We use for this study the same data as that used in 


11 , which include the Nino4 index 


series and the sea surface temperature (SST) series downloaded from the NOAA ESRL 
Physical Sciences Division [l^, and the lOD index namely DMI series from the JAMSTEC 
sitefl^. 

Shown in Fig. [2] is the relative information flow rate from the Indian Ocean SST to El 
Nino, tjo^enso- From it one can see that the information flow accounts for more than 
10% of the uncertainties of Niho4, the maximum reaching 27%. This number is very large. 
Besides, all the values are positive, indicating that the Indian Ocean SST functions to make 
El Nino more uncertain. No wonder recently researchers find that assimilation of the Indian 
Ocean data helps the prediction of El Nino (e.g., 0 ). although traditionally the Indian 
Ocean is mostly ignored in El Nino modeling. 

Besides the relative importance we have just obtained, Fig. 2] also reveals some difference 
in structure from its counterpart, i.e., the Fig. 5b of ll|.(22| A conspicuous difference is 
that now there are clearly two centers, residing on either side of Indian. Note this structure 
is different from the traditional dipolar pattern as one would expect; here both centers are 
positive. This means that the Northern Indian Ocean SST anomalies, both the positive 
phase and negative phase, as an integral entity influence the El Nino variabilities, and. 
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in particular, 
probably this 


make El Nino more unpredictable. The dipolar structure implies that most 


entity is lOD, not others like lOBM (Indian Ocean Basin Mode; see [18 



0.2 

0.1 

0 


- 0.1 


- 0.2 


FIG. 2: (Color online) Relative information flow from the Indian Ocean SST to Niho4, tjo^enso- 


To see more about this, we look at the information flow from the index DMI to the 


tropic Pacific SST. The absolute rates are referred to the Fig. 4a of [ll 

are TDMI^Padfic 


shown in Fig. [3] 

Indeed the computed flow rates are significant, and all are positive. The 


largest r, which occupies a large swathe of the equatorial region between 175°W through 
135°W, reaches 10%. Moreover, the structure reminds one of the El Nino pattern. It is 


generally the same as that in its counter part, i.e.. Fig. 4a of [ll|, save for two changes: 


(1) the maximum center moves westward; (2) the small center of a secondary maximum 
near 125°E at the equator disappears. This clear El Nino-like structure attests to the above 
conjecture that lOD is indeed a major source of uncertainty for the El Nino forecast. 



Longitude (°E) 


0.05 


0 


FIG. 3: (Color online) Relative information flow from the lOD index to the tropical Pacific SST, 
'^lOD^Pacific' 


We have also computed the relative information flows from El Nino to the Indian Ocean 
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SST, and that from the Pacihc SST to the lOD, using the same datasets. The results are 
also signihcant, though only approximately half as shown above. 


B. A financial example 

We now look at the causal relations between several hnancial time series. Here it is 
not our intention to conduct an hnancial economics research or study the market dynamics 
from an econophysical point of view; our purpose is to demonstrate a brief application of 
the aforementioned formalism for time series analysis. Nonetheless, this topic is indeed of 
interest to both physicists and economists in the held of macroscopic econophysics; see, for 
example, fl^ . 

We pick nine stocks in the United States and download their daily prices from _ 

These stocks are: MSFT (Microsoft Corporation), AAPL (Apple Inc.), IBM (International 
Business Machines Corporation), INTC (Intel Corporation), GE (General Electric Com¬ 
pany), WMT (Wal-Mart Stores Inc.), XOM (Exxon Mobil Corporation), CVS (CVS Health 
Corporation), F (Ford Motor Corporation). Among these are high-tech companies (MSFT, 
AAPL, IBM, INTC), retail trade companies [e.g., the drugstore chains (CVS) and discount 
stores (WMT)], automotive industry (F), oil and gas industry (XOM), and the multinational 
conglomerate corporation GE which operates through the segments of energy, technology 
infrastructure, capital hnance, etc. Here by “daily” we mean on a trading day basis, exclud¬ 
ing, say, holidays and weekends. Since stock prices are generally nonstationary, we check 
the series of daily return, i.e., 

R{t) = [P{t + At)-P{t)]/P{t), 

or log-return 

r{t) = In P{t + At) — In P{t), 

where P(t) are the adjusted closing prices in the yahoo spreadsheet, and At is one trading 
day. Following most people we use the series of log-returns r for our purpose. In fact, 
return and log-return series are approximately equivalent, particularly in the high-frequency 
regime, as indicated by jl^. Since the most recent stock MSFT started on March 13, 1986, 
all the series are chosen from that date through December 26, 2014, when we started to 
examine these series. This amounts to 7260 data points, and hence 7259 points for the 
log-return series. 

Using Eq. ([3]), we compute the information flows between the nine stocks and form a 
matrix of flow rates; see Table [H A flow direction is represented with the matrix indices; 
more specihcally, it is from the row index to the column index. For example, listed at 
the location (2,4) is T 2 ^. 4 , i.e., Taapl^intc, the flow rate from Apple to Intel, while (4,2) 
stores the rate of the reverse flow, Tintc^aapl- Also listed in the table are the respective 
conhdence intervals at the 90% level. 

From Table HI most of the information flow rates are signihcant at the 90% level, as high¬ 
lighted. Their values vary from 4 to 22 (units: 10“^ nats/day; same below in this section). 
The maximum is \Tibm^xom\ = 22, and second to it are \Twmt^cvs\ and \Tcvs^ge\-, both 
being 21. The inhuence of IBM to Exxon is not a surprise, considering the dependence of 
the oil industry on high-tech equipments. The mutual causality between the retail stores 
WMT and CVS are also understandable. The information how from CVS to GE could be 
through the sales of GE products; after all, GE makes household appliances. For the rest in 
the table, they can be summarized from the following two aspects. 
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(1) Companies as sources. 

Look at the table row by row. Perhaps the most conspicuous feature is that the whole 
CVS row is signihcant. Next to it is XOM, with only three entries insignihcant. That is 
to say, CVS has been found causal to all other stocks, though the causality magnitudes 
are yet to be assessed (see below). This does make sense. As a chain of convenience 
stores, CVS connects most of the general consumers and commodities and hence the 
corresponding industries. For XOM, it is also understandable that why it makes a 
source of causality. Oil or gas is for sure one of the most fundamental component in 
the American economy. 

(2) Companies as recipients. 

Examining column by column, the most outstanding stock is again CVS, with only 
one entry insignihcant. That is to say, CVS is inhuenced by all other stocks except 
XOM. Following CVS is XOM, WMT, and INTO. The IBM and MSFT columns form 
the third tier. 

A few words regarding the stock F. As a cause to other stocks (though causality maybe 
tiny), XOM has not been identihed to be causal to F. In fact, F has not been found causal 
to XOM, either. This is a little surprising; the reason(s) can be found only after a careful 
analysis of Ford, which is beyond the scope here. (In fact, computation does reveal informa¬ 
tion hows between XOM and Toyota.) Interestingly, \Tp^wmt\ > \Tf^cvs\- This is easy to 
understand, as we rely on our motor vehicles to shop at Wal-Mart, while CVS stores could 
be just somewhere in the neighborhood! 


TABLE I: The rates of absolute information flow between the 9 chosen stocks (in 10“^ nats per 
trading day). At each entry the direction is from the row index to the column index of the matrix. 
Also listed are the standard errors at a 90% significance level, and highlighted are the significant 
flows. 


MSFT AAPL IBM 


MSFT 

/ 

5±7 

-3±8 

AAPL 

-2±7 

/ 

-11±7 

IBM 

0±8 

5±7 

/ 

INTO 

16±11 

10±9 

-7±9 

GE 

2±8 

-3±6 

-13±9 

WMT 

10±6 

7±5 

4±6 

XOM 

-10±6 

-3±4 

-14±7 

CVS 

-9±4 

-5±3 

-12±4 

F 

0±5 

0±4 

0±6 


INTO GE WMT XOM 


-12±11 

1±8 

-1±6 

- 12±6 

-2±9 

-7±6 

-4±5 

-11±4 

-9±9 

-8±9 

-11±6 

-22±7 

/ 

0±8 

-2±6 

-12±5 

-16±8 

/ 

-10±9 

-6±9 

-5±6 

6±9 

/ 

0±6 

-13±5 


-15±9 

-17±6 

/ 

-11±4 

-21±6 

-17±7 

-17±5 

-10±6 

6±9 

-13±5 

1±6 


CVS F 


10±4 

3±5 

4±3 

-2±4 

6±4 

1±6 

11±4 

3±6 

14±6 

6±9 

21±7 

9±5 

4±5 

-1±6 

/ 

-7±4 

6±4 

/ 


The above signihcant absolute information hows, large or small, still need to assessed 
regarding their respective relative importances before any conclusion of causality is reached. 
Using Eq. fl32|) . we compute the relative information how rates (in percentage) and tabulate 
them in Table m The matrix entries are arranged in the same way as above. For clarity, 
those above or equal to 1% are highlighted. In contrast to Table [H we see only a few of 
the information hows account for more than 1% of their respective huctuations. These 
include \tjbm^xom\, \tcvs^ge\, Itwmt^cvsI, \tcvs^wmt\, \tcvs^xom\, Itxom^wmtI, 
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Itintc^msftI, \tge^intc\, and \txom^ge\, the first three being the largest. This echos 
what we have introduced in the beginning: though signihcant, some information flows may 
be negligible in their own marginal entropy balances. 


TABLE II: As Table [H but for relative information flow (in percentage). 


MSFT AAPL IBM INTO GE WMT XOM CVS F 


MSFT 

/ 

0.3 

-0.2 

-0.8 

0.0 

0.0 

-0.7 

0.6 

0.2 

AAPL 

-0.1 

/ 

-0.7 

-0.1 

-0.5 

-0.2 

-0.7 

0.2 

-0.1 

IBM 

0.0 

0.3 

/ 

-0.6 

-0.5 

-0.7 

-1.3 

0.4 

0.1 

INTO 

1.0 1 

0.7 

-0.4 

/ 

0.0 

-0.1 

-0.7 

0.7 

0.2 

GE 

0.1 

-0.2 

-0.8 

|1.0| 

/ 

-0.6 

-0.3 

0.9 

0.4 

WMT 

0.6 

0.4 

0.2 

-0.3 

0.4 

/ 

0.0 

1.3 

0.6 

XOM 

-0.6 

-0.2 

-0.9 

-0.9 

-1.0 

-1.1 

/ 

0.3 

-0.1 

GVS 

-0.6 

-0.3 

-0.8 

-0.7 

-1.3 

-1.1 


-1.1 

/ 

-0.5 

F 

0.0 

0.0 

0.0 

-0.6 

0.4 

-0.8 

0.0 

0.4 

/ 


It should be noted that the causal relations generally change with time. If the series are 
long enough, we may look at how these information flows may vary from period to period. 
Pick the pair (IBM, GE) as an example. For the duration (March 1986 through present) 
considered above, Tge^ibm = —13 ±9, while Tjbm^ge is not signihcant. Neither tge^ibm 
nor tjbm^ge reaches 1%. Since from the yahoo site both GE and IBM can be dated back 
to January 2, 1962, we can extend the time series a lot up to 13338 data points. Shown in 
Fig. 0^ are the series of their historic prices, and in Fig. |3 )d and|3]c are the corresponding 
log-returns. 

Gomputation of information hows with the whole series (13338 points) results in 
tibm^ge = 1.6% and tge^ibm = —0.5%, and Tjbm^ge = (27 ± 6) x 10“^, Tge^ibm = 
(7 ± 6) X 10“^ nats/day, both being signihcant at the 90% level. This is very diherent from 
what are shown in Tables [I] and [TTl with the causal structure changed from a weak two-way 
causality to a stronger and more or less one-way causality. Since in the above only the data 
of the recent 30 years are used, we expect that in the early years this causal structure could 
be much enhanced. Ghoose the hrst 7000 points ( from January 1962 through November 
1989), the computed relative information how rates are: 


tibm^ge — 3.1%, tge^ibm — —0.2%; 

Tibm^ge = 54 ± 8, Tge^ibm = 3 ± 8, 

where the units for the latter pair are in 10“^ nats/day, same below in this section. Further 
narrow down the period to 2250-3250 (corresponding to the period 1971-1975), then 

tibm^ge = 5.7%, tge^ibm = —0.99%; 

Tibm^ge = 101 ± 21, Tge^ibm = 14 ± 21, 

attaining the maximum of tibm^gei in contrast to the insignihcant how in Tabled! Obvi¬ 
ously, during this period, the causality can be viewed as one-way, i.e., from IBM to GE. And 
the relative how makes more than 5%, much larger than those in Table dll 
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FIG. 4: (Color online) (a) Historic prices of IBM and GE (in US dollars), (b) Log-returns of the 
IBM and GE stocks, (c) A close-up of (b). 


The above remarkable causal structure for that particular period actually can trace its 
reason back in the history of GE[^. There is such a period in 1960’s when “Seven Dwarfs” 
(Burroughs, Sperry Rand, Control Data, Honeywell, General Electric, RCA and NCR) com¬ 
peted with IBM the giant for computer business, and, particularly, to build mainframes. In 
1965, GE had only a 3.7-percent market share of the industry, though it was then dubbed as 
the “King of the Dwarfs”, while IBM had 65.3% share. Historically GE was once the largest 
computer user outside the US Federal Government; it got into computer manufacturing to 
avoid dependency on others. And, indeed, throughout the 60s, the causalities between GE 
and IBM are not significant. Then, why, as time entered 70s, was the information flow from 
IBM to GE suddenly increased to its highest level? It turned out that GE sold its computer 
division to Honeywell in 1970; in the following years (starting from 1971), it relied much 
on the IBM products. This GE computer era, which has almost gone to oblivion, does 
substantiate the existence of a causation between GE and IBM, and, to be more precise, an 
essentially one-way causation from IBM to GE. In this sense, our formalism is well validated. 


VI. CONCLUSIONS 

To assess the importance of a flow of information from a series, say, {X 2 (n)}, to another, 
say, {Xi(?7,)}, it needs to be normalized. The normalization cannot follow a way as that in 
computing a correlation coefficient, since there is no such a theorem like the Gauchy-Schwarz 
inequality for it to base. Getting down to the fundamentals, we were able to distinguish 
three types of mechanisms that contribute to the evolution of the marginal entropy of Xi, 
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dH* 

• the contribution from Xi its own; 

• T 2 ^u the information flow from X 2 ; 

^^noise 

• —^—: the contribution from noise. 

Similarly there are three such quantities for X 2 , as schematized in Fig. [5l We hence proposed 
that the normalization can be fulhlled as follows: 


'^ 2-^1 


T, 


2^1 


dm 


dt 


+ 


dt 




Ti 


l ->-2 


dm 


dt 


+ 


dt 


+ \T, 


1^-2 


Obviously, a normalized flow tells its importance relative to other mechanisms within its 
own series. In other words, the two flows are normalized differently, echoing the property 
of asymmetry which makes information flow analysis distinctly different from those such as 
correlation analysis or mutual information analysis. 



dHj 

dt 

dH, 

dt 


FIG. 5: A schematic of the marginal entropy evolutions and information flows in the system of 

(W, A 2 ). 


The above normalizer can be accurately obtained in the framework of a dynamical system. 
When only two equi-distanced series, say, Xi and X 2 , are given, with a linear model its 
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constituents can be estimated as follows, in a nutshell, 


= P , 


dHl 

dt 

dH^oise 

dt 

^ 2^1 = 


At 


\Cdi,di + P^Cii + q^C22 — “^pCdi,! — 2gCrfi^2 + ‘^PqCi2\ , 


2^11 

/^2 

Uil0i202,dl — 


/^2 r^2 

C^llC^22 — C^11C^12 

where At is the time stepsize. 


(33) 

(34) 

(35) 


P = 

q = 


C' 22 C'i, 


dl 


C,2C, 


120 ^2,dl 


C^llC^22 — 0^12 

~C'l2C'l,dl + C'llC'2,dl 


1 ^ 111^22 


^2 
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Cij the sample covariance between Xi and Xj, Ci^dj the sample covariance between Xj and 
Xj, and 

^i,n+k ^i,n 


(fc = 1; but for chaotic series sampled at high resolution, k = 2 may be needed). 

It should be noted that a relative information flow is for the comparison purpose within 
its own series. The two reverse flows between two series can only be compared in terms 
of absolute value, since they belong to different series. In this sense, absolute and relative 
information flows should be examined simultaneously. This is clarihed in the schematic 
diagram in Fig. |5l and has been exemplihed in the validations with two autoregressive 
processes. It is quite normal that two identical information flows may differ a lot in relative 
importance with respect to their own series, as testihed in our realistic applications. In some 
extreme situation, a pair of equal flows may hnd one dominant but another negligible in 
their respective entropy balances. 

Partly for demonstration and partly for verihcation, we have presented two applications. 
The hrst is a re-examination of the climate science problem previously studied in Ref. 0- 
Considering the fadeout of the recent portentous predictions of a “super” or “monster” El 
Nino, we have particularly focused on the predictability of El Nino. Our result reconhrmed 
that the Indian Ocean SST is a source of uncertainty to the El Nino prediction. We further 
clarihed that the information how from the Indian Ocean is mainly through the Indian 
Ocean Dipole (lOD). 

Another realistic problem we have examined regards the causation between a few ran¬ 
domly picked American stocks. It is shown that many hows (and hence causalities), though 
signihcant at a 90% level, their respective importances relative to other mechanisms are 
mostly negligible. The resulting matrices of absolute and relative information hows provide 
us a pattern of causality mostly understandable using our common sense. For example. 
Ford has a larger inhuence on Wal-Mart than on CVS because people rely on motor vehi¬ 
cles to shop at Wal-Mart, while CVS could be just somewhere within a walking distance. 
A particularly interesting case is that we have identihed a strong one-way causality from 
IBM to GE during the early stage of these companies. This has revealed to us the story of 
“Seven Dwarfs” competing IBM the giant for computer market. In an era when this story 
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has almost gone to oblivion (one even cannot find it from GE’s website), and GE may have 
left us an impression that it never built any computers, let alone a series of mainframes, this 
finding is indeed remarkable. 
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